NHGH analysis

Katrine Meldgård, Margrethe Bøe Lysø, Kristine Rosted Petersen, Enrico Leonardi and Pernille Jensen

Introduction

NHANES glycohemoglobin data

  • National Health and Nutrition Examination Survey

Diabetes Mellitus (DM)

  • Type 1 Diabetes: Inefficient production of insulin.

  • Type 2 Diabetes: Inefficient utilization of insulin.

  • 422 million diagnosed, 1.5 million deaths each year

Aim

  • Correlation between biomarkers/measurements and diabetes
  • Possibility of regaining values after medication
  • How income classes influence getting diabetes and medication

Method

Data set contained X observations with X variables after cleaning

Data Wrangling

Added

Descriptive analysis

Observations: 6795
Variables (augmented): 26
Diagnosed: 914
Medicated: 607

Income vs. medication

Biomarkers and diagnosis/medication status

Physical attributes and disease/medication status

PCA Analysis

  • Data
    • Non-medicated individuals
  • Classes not seperated

Logistic regression model

  • Backwards selection:
    • Weight
    • Leg
    • Waist
    • Creatinine
    • Glycohemoglobin

Classification

  • Model based on parameters found by LR
  • Data
    • Non-medicated individuals
  • Not many predicted as 1
  • AUC is low

Discussion

  • Relation between anthropocentric- and biomarker measurements.
  • Classification and PCA: No clear relation.
  • Confusion matrix: Low prediction for diabetes.
  • Uneven distribution between diabetic and non-diabetic.
    • Improvement: Larger group of diabetic.
  • Income classes and medication.

Conclusion: Not any clear relation for diabetes.